Evaluating Concrete Strength Model Performance

Using Cross-validation Methods

Sai Devarashetty, Mattick, Musson, Perez

2024-07-28

Introduction To Crossvalidation

  • Measure performance and generalizability of machine learning and predictive models.
  • Compare different models constructed from the same data set.

CV widely used in various fields including:

  • Machine Learning
  • Data Mining
  • Bioinformatics
  • Minimize overfitting
  • Ensure a model generalizes to unseen data
  • Tune hyperparameters

Definitions

Generalizability:
How well predictive models created from a sample fit other samples from the same population.

Overfitting:
When a model fits the the underlying patterns of the training data too well.

Model fits characteristics specific to the training set:

  • Noise
  • Random fluctuations
  • Outliers

Hyperparameters:
Are model configuration variables

Nodes and layers in a neural network

Branches in a decision tree

Process

Subsets the data into K approximately equally sized folds

  • Randomly
  • Without replacement

(Song, Tang, and Wee 2021)

Split The Subsets into test and training sets

  • 1 test set
  • K-1 training set

  • Fit the model to the training data
  • Apply the fitted model to the test set
  • Measure the prediction error

Repeat K Times

  • Fit to all K-1 combinations
  • Test with each subset 1 time

Calculate the mean error

Bias-Variance Trade-Off

 

K-Fold vs. LOOCV
Method Computation Bias Variance
K-Fold Lower Intermediate Lower
LOOCV Highest Unbiased High

K-fold where K = 5 or K = 10 is recommended:

  • Lowe computational cost
  • Does not show excessive bias
  • Does not show excessive variance

(James et al. 2013), (Gorriz et al. 2024)

 

Model Measures of Error (MOE)

  • Measure the quality of fit of a model
  • Measuring error is a critical data modeling step
  • Different MOE for different data types

By measuring the quality of fit we can select the model that Generalizes best.

\[ \text{MAE} = \frac{1}{n} \sum_{i=1}^n |y_i - \hat{f}(x_i)| \tag{1} \]

  • A measure of error magnitude
  • The sine does not matter - absolute value
  • lower magnitude indicates better fit
  • Take the mean absolute difference between:
    • observed \((y_i)\) and the predicted \(\hat{f}(x_i)\) values
  • \(n\) is the number of observations,
  • \(\hat{f}(x_i)\) is the model prediction \(\hat{f}\) for the ith observation
  • \(y_i\) is the observed value

\[ \text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i-\hat{f}(x_i))^2} \tag{2} \]

  • A measure of error magnitude
  • lower magnitude indicates better fit
  • Error is weighted
    • Squaring the error give more weight to the larger ones
    • Taking the square root returns the error to the same units as the response variable

\[ \text{R}^2 = \frac{SS_{tot}-SS_{res}}{SS_{tot}} = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{f}(x_i))^2}{\sum_{i=1}^{n}(y_i-\bar{f}(x_i))^2} \tag{3} \]

  • Proportion of the variance explained by the predictor(s)
  • higher value means better the fit
    • An \(R^2\) value of 0.75 indicates 75% of the variance in the response variable is explained by the predictor(s)

(James et al. 2013), (Hawkins, Basak, and Mills 2003), (Helsel and Hirsch 1993)

K-Fold Cross-Validation

\[ CV_{(k)} = \frac{1}{k}\sum_{i=1}^{k} \text{Measuer of Errori}_i \tag{4} \]

(James et al. 2013),(Browne 2000)

Leave One Out Cross-validations (LOOCV)

\[ CV_{(n)} = \frac{1}{n}\sum_{i=1}^{n} \text{Measuer of Errori}_i \tag{5} \]

(James et al. 2013),(Browne 2000)

Nested Cross-Validation

(Berrar et al. 2019)

Study Data

(I-C Yeh 1998) modeled compression strength of high performance concrete (HPC) at various ages and made with different ratios of components. The data used for their study was made publicly available and can be downloaded UCI Machine Learning Repository (I-Cheng Yeh 2007).

Data Exploration and Visulation

  • Target variable:
    • Strength (MPa)
  • Predictor variables:
    • Cement (kg/m3)
    • Superplasticizer (kg/m3)
    • Age (days)
    • Water (kg/m3)

All variables are quantitative

Linear Regression Model

Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.2578655 5.1878634 5.446918 1.0e-07
Cement 0.0668433 0.0039668 16.850539 0.0e+00
Superplasticizer 0.8716897 0.0903825 9.644449 0.0e+00
Age 0.1110466 0.0069538 15.969235 0.0e+00
Water -0.1195600 0.0257210 -4.648334 3.9e-06

\[ \hat{Strength} = 28.258_\text{Cement + } 0.067_\text{Superplasticizer + } 0.872_\text{Age } 0.111_\text{Water} \]

Linear Regression CV Results

  • K-Fold Results:
Measure of Error Result
RMSE 12.13
MAE 9.23
R2 0.46

  • LOOCV Results:
Measure of Error Result
RMSE 12.13
MAE 9.23
R2 0.46

  • Nested CV Results:
Measure of Error Result
RMSE 11.87
MAE 9.43
R2 0.49

LightGBM Model

 

Measure of Error Result
RMSE 8.73
MAE 6.82
R2 0.73

 


  • Ensemble of decision trees
  • Uses gradient boosting
  • Final prediction is the sum of predictions from all individual trees
  • Feature importance

LightGBM CV Results

  • K-Fold Results:
Measure of Error Result
RMSE 8.73
MAE 6.82
R2 0.73

  • LOOCV Results:
Measure of Error Result
RMSE 5.93
MAE 4.32
R2 0.87

  • Nested CV Results:
Measure of Error Result
RMSE 8.27
MAE 6.39
R2 0.75

Comparison of Models

  • Performance Comparison:
      Linear Regression vs. LightGBM
  • Advantages and disadvantages
     of each model
Method Measure of Error Linear Regression LightGBM
5-Fold RMSE 12.13 8.73
5-Fold MAE 9.23 6.82
5-Fold R2 0.46 0.73
LOOCV RMSE 12.13 5.93
LOOCV MAE 9.23 4.32
LOOCV R2 0.46 0.87
NCV RMSE 11.87 8.27
NCV MAE 9.43 6.39
NCV R2 0.49 0.75

Model Comparison K-Fold Plot

Model Comparison LOOCV Plot

Model Comparison Nested CV Plot

Conclusion

  • Cross-validation techniques and LightGBM effectively reduce overfitting and enhance model accuracy.
  • LightGBM offers superior accuracy and efficiency.
  • Identified key predictors for accurate model development.
  • Robust framework for model evaluation, improving decision-making in concrete design and construction.

Future Research

  • Further refinement of these techniques to improve predictive accuracy.
  • Exploration of additional advanced models.
  • Application in various engineering contexts to enhance model reliability and performance.

References

All figures were created by the authors.

Berrar, Daniel et al. 2019. “Cross-Validation.”
Browne, Michael W. 2000. “Cross-Validation Methods.” Journal of Mathematical Psychology 44 (1): 108–32.
Gorriz, Juan M, Fermı́n Segovia, Javier Ramirez, Andrés Ortiz, and John Suckling. 2024. “Is k-Fold Cross Validation the Best Model Selection Method for Machine Learning?” arXiv Preprint arXiv:2401.16407.
Hawkins, Douglas M, Subhash C Basak, and Denise Mills. 2003. “Assessing Model Fit by Cross-Validation.” Journal of Chemical Information and Computer Sciences 43 (2): 579–86.
Helsel, Dennis R, and Robert M Hirsch. 1993. Statistical Methods in Water Resources. Elsevier.
James, Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani, et al. 2013. An Introduction to Statistical Learning. Vol. 112. Springer.
Song, Q Chelsea, Chen Tang, and Serena Wee. 2021. “Making Sense of Model Generalizability: A Tutorial on Cross-Validation in r and Shiny.” Advances in Methods and Practices in Psychological Science 4 (1): 2515245920947067.
Yeh, I-C. 1998. “Modeling of Strength of High-Performance Concrete Using Artificial Neural Networks.” Cement and Concrete Research 28 (12): 1797–1808.
Yeh, I-Cheng. 2007. Concrete Compressive Strength.” UCI Machine Learning Repository.